\section{Experimental Result}
\label{experiment}

\subsection{Data set and experiment setup}
 \begin{figure}[htb] 
\begin{minipage}[b]{1.0\linewidth}
 \centering
 \centerline{\includegraphics[scale=0.5]{data_set}}
%  \vspace{2.0cm}
\end{minipage}
\caption{The Experimental Data Set: top row presents positive samples, and the bottom row shows the negative samples}
\label{fig:data_set}
%%
\end{figure}
The video in our experiment are provided by MCG stadium. The video size is $640*480$ pixels, and the frame rate is 30 fps. We manually crop the heads and non-heads(fig.\ref{fig:data_set}). In order to increase the difficulty of detection the non-heads are search through moving regions which include shadow, other parts of body, and place that has strong illumination. 
%The crop images are annotated manually and resized to $32*32$.  
We collect 571 positive samples and 429 negative samples from 16 video recordings. The data set are divided into three types including the attributes: HOG, HOOF, and HOG-HOOF. Different bin numbers are tested in each type. We apply \textit{libsvm}\cite{libsvm} to build the classifier. In SVM, it is important to search for the optimal parameters and therefore the grid search is performed on different kernels to find the optimal parameters. After training, we test the data by leave-one-out strategy. Table \ref{table:experiment_result} shows the experiment result. The kernel stands for the best kernel selection during the grid search, and three metrics are provided for comparison: precision, recall, and F-score. 
 \begin{figure}[htb] 
\begin{minipage}[b]{1.0\linewidth}
 \centering
 \centerline{\includegraphics[scale=0.5]{kernel_comparison1}}
%  \vspace{2.0cm}
\end{minipage}
\caption{Comparison between different SVM kernels on HOOF-6}
\label{fig:kernel_comp}
%%
\end{figure}
%\begin{table} [htbp]
%\caption{Performance of HOOF, HOG, and HOOFHOG Human Detector } \label{table:result}
%\setlength{\tabcolsep}{4pt}
%\centering
%\small
%\begin{tabular} {c | r r r }
%Feature & Bin=9 & Bin=6 & Bin=4\\
%\hline
%   HOOF   &80.5\%&81.1\%&80.4\%\\
%   HOG   &61.4\%&60.4\%&59.2\%\\
%   HOOFHOG    &78\%&78.3\%&78.1\%\\
%\end{tabular}
%\end{table}	
\subsection{Human Detection Result}

In \ref{experiment}, HOOF performs the best among three features. It has 88\% accuracy and high precision and recall are high. HOG-HOOF come up closely with second and has around 86\%. The accuracy of HOG is 65\% despite its medium performance of recall. Result shows that the HOOF feature is more robust and accurate than using HOG alone. The possible reasons may be the low quality video, the figures are blurred, far away, and have very large noise. People standing far away can only be detected by motion information because the visual data is not discriminative enough. 

The comparison between different bin numbers is close, and therefore we recommend using 6 bins to obtain enough information and less computing. Another comparison between different kernels in fig.\ref{fig:kernel_comp} shows the RBF kernel has better performance than other two, while the 2-degree polynomial has a very competitive numbers. The linear kernel, is also considerable if there is constraint on computing resource.

